18
Quantization of Neural Networks
process is known as calibration, an important step in uniform quantization. where [α, β] is
the clip range and b is the bit-width. The clipping range, [α, β], determines the range of real
values that should be quantized. The choice of this range is crucial, as it determines the
quantization’s precision and the quantized model’s overall quality. This process is known as
calibration, an important step in uniform quantization. The clipping range can be tighter
in asymmetric quantization than in symmetric quantization. This is especially important
for signals with imbalanced values, like activations after ReLU, which always have non-
negative values. Furthermore, symmetric quantization simplifies the quantization function
by centering the zero point at Z = 0, making the quantization process more straightforward
as follows:
qx = INT( x
S ).
(2.7)
In general, the full-range approach provides greater accuracy. Symmetric quantization is
commonly used for quantizing weights due to its simplicity and reduced computational cost
during inference. However, asymmetric quantization may be more effective for activations
because the offset in asymmetric activations can be absorbed into the bias or used to
initialize the accumulator.
2.2
LSQ: Learned Step Size Quantization
Fixed quantization methods that rely on user-defined settings do not guarantee optimal
network performance and may still produce suboptimal results even if they minimize quan-
tization error. An alternative approach is learning the quantization mapping by minimizing
task loss, directly improving the desired metric. However, this method is challenging because
the quantizer is discontinuous and requires an accurate approximation of its gradient, which
existing methods [43] have done roughly that overlooks the effects of transitions between
quantized states.
This section introduces a new method for learning the quantization mapping for each
layer in a deep network called Learned Step Size Quantization (LSQ) [61]. LSQ improves
on previous methods with two key innovations. First, we offer a simple way to estimate
the gradient of the quantizer step size, considering the impact of transitions between quan-
tized states. This results in more refined optimization when learning the step size as a
model parameter. Second, we introduce a heuristic to balance the magnitude of step size
updates with weight updates, leading to improved convergence. Our approach can be used
to quantize both activations and weights and is compatible with existing techniques for
backpropagation and stochastic gradient descent.
2.2.1
Notations
The goal of quantization in deep networks is to reduce the precision of the weights and the
activations during the inference time to increase the computational efficiency. Given the data
to quantize v, the quantizer step size s, and the number of positive and negative quantization
levels (QP and QN), a quantizer is used to compute ˆv, a quantized representation on the
whole scale of the data, and ˆv, a quantized representation of the data at the same scale as
v:
¯v = ⌊clip(v/s, −QN, QP )⌉
(2.8)
ˆv = ¯v × s
(2.9)